Project - Personal Loan Campaign


Context:

Objective:

Data Information

The records contain the Customer personal information and their banking account information & usage pattern. The detailed data dictionary is given below:

Data Dictionary


Importing the necessary packages:



Unwrapping the Customer Information:



Data Preprocessing


Dropping the Customer ID Column

Fixing the Experience data

Converting the ZipCode to Ranges so that we can convert it to a Category type

Checking the data types of the columns


Fixing the data types


Performing the Sanity Check


Summary of Data Analysis

Data Structure:

Data Cleaning:

Data Description:


EDA Analysis - Analyzing respective attributes to understand the data pattern


Insights from Categorical Data

Insights from Numerical Data


Univariate Analysis

Analyzing the count and percentage of Categorical attributes using a bar chart

Overall observations of the Categorical attributes

Analyzing the Numerical attributes using Histogram and Box Plots

Aalyzing the Age of the Customers

Observations:

Analyzing the Experience of the Customers

Observations:

Analyzing the Income of the Customers

Observations:

Analyzing the number of CCAvg value of the customers

Observations:

Analyzing the Mortgage value of the Customers

Observations:

Bivariate Analysis - Visualise Numerical variables association with each other & its Correlation


Observations:

Observations:

Analyzing the Categorial attributes with Personal Loan

Observation:

Analyzing the Numerical attributes with Personal Loan

Observation:

Education vs Personal Loan

Observations:

Family vs Personal Loan

Observations:

Securities Account vs Personal Loan

Observations:

CD Account vs Personal Loan

Observations:

Online Users vs Personal Loan

Observations:

Credit Card Users vs Personal Loan

Observations:

CC Average vs Personal Loan

Observations:

Zipcode Group vs Income vs Personal Loan

Observations:

Zipcode Group vs Income vs Age vs Personal Loan

Observations:

Income vs Education vs Personal Loan

Observations:

Age vs Mortgage vs Personal Loan

Observations:

Age vs Family vs Personal Loan

Observations:

Income vs Age vs Personal Loan

Observations:

Income vs Mortgage vs Personal Loan

Observations:

Age vs Experience vs Personal Loan

Observations:

Income vs Experience vs Personal Loan

Observations:

Summary of EDA Analysis

- Insights from Categorical Data
- Insights from Numerical Data
- Univariate Analysis Summary
- Bivariate Analysis Summary

Model Building

Model evaluation criterion:

Model can make wrong predictions as:

  1. Predicting a customer will apply for the loan but in reality the customer would not apply - Loss of resources

  2. Predicting a customer will not apply for the loan but in reality the customer would have applied for the loan. - Loss of opportunity

Which case is more important?

How to increase the customer to avail Personal loan i.e need to reduce False Negatives?

Split Data

Modeling using Logistic Regression

Checking model performance on training set

Checking performance on test set

ROC Model Prediction

ROC On Training Set

ROC On Testing Set

AUC - ROC Model Prediction

Checking AUC-ROC model performance on training set

Checking AUC-ROC model performance on testing set

Precision - Recall Curve Model Prediction

Checking Precision - Recall Curve model performance on training set

Checking Precision - Recall Curve model performance on testing set

Model Performance Summary

Summary of Model Building

Coefficients

Coefficient interpretations

Converting coefficients to odds


Build Decision Tree Model


Split Data

Build Decision Tree Model - Default values

Checking model performance on Training set

Checking model performance on Test set

Visualizing the Decision Tree

Observations:

Build Decision Tree Model - Max Depth (5)

Checking performance on training set

Checking performance on testing set

Visualizing the Decision Tree

Observations:

Build using GridSearch for Hyperparameter tuning of our tree model to reduce over fitting

Checking performance on training set

Checking performance on testing set

Visualizing the Decision Tree

Observations:

Build using Cost Complexity Pruning - DecisionTreeClassifier

Total Impurity vs effective alpha for training set

Number of Nodes/Depth vs Alpha

Accuracy vs alpha for training and testing sets

Maximum value of Accuracy Score is at 0.040 alpha, but if we choose decision tree will only have a root node and we would lose the business rules. The best aplha for this accuracy is calculate below:

Since accuracy isn't the right metric for our data we would want higher Precision/F1 Score

Training the data with Precision score using the Cost Complexity Pruning - DecisionTreeClassifier

Checking performance on training set

Checking performance on testing set

Visualizing the Decision Tree

Comparing all the decision tree models

Observations:

Summary of Decision Model Tree Analysis

Comparisons - Logistic Regression VS Decision Tree

Logistic Regression

Decision Tree Analysis

Recommendations:

Summary Section References:

- Summary of Data Analysis
- EDA Summary
- Model evaluation criterion
- Model Building Summary
- Decision Model Tree Analysis Summary
- Comparisons - Decision tree VS Logistic regression
- Recommendations